mlp module
M3-Net: A Cost-Effective Graph-Free MLP-Based Model for Traffic Prediction
Jin, Guangyin, Lai, Sicong, Hao, Xiaoshuai, Zhang, Mingtao, Zhang, Jinlei
Achieving accurate traffic prediction is a fundamental but crucial task in the development of current intelligent transportation systems.Most of the mainstream methods that have made breakthroughs in traffic prediction rely on spatio-temporal graph neural networks, spatio-temporal attention mechanisms, etc. The main challenges of the existing deep learning approaches are that they either depend on a complete traffic network structure or require intricate model designs to capture complex spatio-temporal dependencies. These limitations pose significant challenges for the efficient deployment and operation of deep learning models on large-scale datasets. To address these challenges, we propose a cost-effective graph-free Multilayer Perceptron (MLP) based model M3-Net for traffic prediction. Our proposed model not only employs time series and spatio-temporal embeddings for efficient feature processing but also first introduces a novel MLP-Mixer architecture with a mixture of experts (MoE) mechanism. Extensive experiments conducted on multiple real datasets demonstrate the superiority of the proposed model in terms of prediction performance and lightweight deployment.Our code is available at https://github.com/jinguangyin/M3_NET
- Asia > South Korea > Seoul > Seoul (0.05)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
Balancing Knowledge Updates: Toward Unified Modular Editing in LLMs
Liu, Jiahao, Wang, Zijian, Zhao, Kuo, Hu, Dong
Knowledge editing has emerged as an efficient approach for updating factual knowledge in large language models (LLMs), typically achieved by first locating key knowledge-storage modules and then modifying their parameters. However, most existing methods focus exclusively on updating the weights of Multi-Layer Perceptron (MLP) modules, which are commonly identified as the primary repositories of factual information. Other important components, such as attention (Attn) modules--one of the core modules in LLMs--are often ignored during editing. This biased allocation of updates can leave residual outdated knowledge in the model and limit the effectiveness of knowledge editing. In this paper, we conduct comprehensive and systematic knowledge localization experiments on advanced LLMs, revealing that Attn modules play a substantial role in factual knowledge storage and retrieval, especially in earlier layers. Building on these insights, we propose IntAttn-Edit, a novel method that extends the associative memory paradigm to jointly update both MLP and Attn modules. Our approach employs a knowledge balancing strategy that proportionally allocates update magnitudes based on each module's measured contribution to knowledge storage. Extensive experiments on popular benchmarks demonstrate that IntAttn-Edit consistently achieves superior results over existing methods, delivering higher edit success, improved generalization, and robust knowledge preservation. Further empirical analysis shows that our knowledge balancing strategy enables the editing performance to remain within the optimal range across different settings.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Training-free Truthfulness Detection via Value Vectors in LLMs
Liu, Runheng, Huang, Heyan, Xiao, Xingchen, Wu, Zhijing
Large language models often generate factually incorrect outputs, motivating efforts to detect the truthfulness of their content. Most existing approaches rely on training probes over internal activations, but these methods suffer from scalability and generalization issues. A recent training-free method, NoVo, addresses this challenge by exploiting statistical patterns from the model itself. However, it focuses exclusively on attention mechanisms, potentially overlooking the MLP module-a core component of Transformer models known to support factual recall. In this paper, we show that certain value vectors within MLP modules exhibit truthfulness-related statistical patterns. Building on this insight, we propose TruthV, a simple and interpretable training-free method that detects content truthfulness by leveraging these value vectors. On the NoVo benchmark, TruthV significantly outperforms both NoVo and log-likelihood baselines, demonstrating that MLP modules-despite being neglected in prior training-free efforts-encode rich and useful signals for truthfulness detection. These findings offer new insights into how truthfulness is internally represented in LLMs and motivate further research on scalable and interpretable truthfulness detection.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > China > Hong Kong (0.04)
- (7 more...)
Do All Autoregressive Transformers Remember Facts the Same Way? A Cross-Architecture Analysis of Recall Mechanisms
Choe, Minyeong, Cho, Haehyun, Seo, Changho, Kim, Hyunil
Understanding how Transformer-based language models store and retrieve factual associations is critical for improving interpretability and enabling targeted model editing. Prior work, primarily on GPT-style models, has identified MLP modules in early layers as key contributors to factual recall. However, it remains unclear whether these findings generalize across different autoregressive architectures. To address this, we conduct a comprehensive evaluation of factual recall across several models -- including GPT, LLaMA, Qwen, and DeepSeek -- analyzing where and how factual information is encoded and accessed. Consequently, we find that Qwen-based models behave differently from previous patterns: attention modules in the earliest layers contribute more to factual recall than MLP modules. Our findings suggest that even within the autoregressive Transformer family, architectural variations can lead to fundamentally different mechanisms of factual recall.
- Europe > Italy > Tuscany > Florence (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > Dominican Republic (0.04)
- (4 more...)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Dominican Republic (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (9 more...)
Augmented Shortcuts for Vision Transformers (Supplementary Material) Y ehui T ang
Following that original shortcut connections exist in both MSA and MLP modules, the proposed augmented shortcuts are also embedded into the MLP module (Eq. 10 in the main paper). S.4), the diversity in a block of Aug-ViT model is: Deep neural networks with trainable activations and controlled lipschitz constant.
SDMPrune: Self-Distillation MLP Pruning for Efficient Large Language Models
In spite of strong performance achieved by LLMs, the costs of their deployment are unaffordable. For the compression of LLMs, gradient-based pruning methods present promising effectiveness. However, in these methods, the gradient computation with one-hot labels ignore the potential predictions on other words, thus missing key information for generative capability of the original model. To address this issue, we introduce a self-distillation loss during the pruning phase (rather than post-training) to fully exploit the predictions of the original model, thereby obtaining more accurate gradient information for pruning. Moreover, we find that, compared to attention modules, the predictions of LLM are less sensitive to multilayer perceptron (MLP) modules, which take up more than $5 \times$ parameters (LLaMA3.2-1.2B). To this end, we focus on the pruning of MLP modules, to significantly compress LLM without obvious performance degradation. Experimental results on extensive zero-shot benchmarks demonstrate that our method significantly outperforms existing pruning methods. Furthermore, our method achieves very competitive performance among 1B-scale open source LLMs. The source code and trained weights are available at https://github.com/visresearch/SDMPrune.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- (5 more...)
SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling
Zhao, Anhao, Ye, Fanghua, Fan, Yingqi, Tong, Junlong, Fei, Zhiwei, Su, Hui, Shen, Xiaoyu
Large language models (LLMs) achieve remarkable performance across tasks but incur substantial computational costs due to their deep, multi-layered architectures. Layer pruning has emerged as a strategy to alleviate these inefficiencies, but conventional static pruning methods overlook two critical dynamics inherent to LLM inference: (1) horizontal dynamics, where token-level heterogeneity demands context-aware pruning decisions, and (2) vertical dynamics, where the distinct functional roles of MLP and self-attention layers necessitate component-specific pruning policies. We introduce SkipGPT, a dynamic layer pruning framework designed to optimize computational resource allocation through two core innovations: (1) global token-aware routing to prioritize critical tokens, and (2) decoupled pruning policies for MLP and self-attention components. To mitigate training instability, we propose a two-stage optimization paradigm: first, a disentangled training phase that learns routing strategies via soft parameterization to avoid premature pruning decisions, followed by parameter-efficient LoRA fine-tuning to restore performance impacted by layer removal. Extensive experiments demonstrate that SkipGPT reduces over 40% of model parameters while matching or exceeding the performance of the original dense model across benchmarks. By harmonizing dynamic efficiency with preserved expressivity, SkipGPT advances the practical deployment of scalable, resource-aware LLMs. Our code is publicly available at: https://github.com/EIT-NLP/SkipGPT.
- Asia > China > Zhejiang Province > Ningbo (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > Canada (0.04)
- (3 more...)
FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge
Shen, Xuan, Ma, Weize, Zhou, Yufa, Tang, Enhao, Xie, Yanyue, Li, Zhengang, Gong, Yifan, Wang, Quanyi, Ding, Henghui, Wang, Yiwei, Wang, Yanzhi, Zhao, Pu, Lin, Jun, Gu, Jiuxiang
Auto-regressive (AR) models, initially successful in language generation, have recently shown promise in visual generation tasks due to their superior sampling efficiency. Unlike image generation, video generation requires a substantially larger number of tokens to produce coherent temporal frames, resulting in significant overhead during the decoding phase. Our key observations are: (i) MLP modules in the decode phase dominate the inference latency, and (ii) there exists high temporal redundancy in MLP outputs of adjacent frames. In this paper, we propose the \textbf{FastCar} framework to accelerate the decode phase for the AR video generation by exploring the temporal redundancy. The Temporal Attention Score (TAS) is proposed to determine whether to apply the replay strategy (\textit{i.e.}, reusing cached MLP outputs from the previous frame to reduce redundant computations) with detailed theoretical analysis and justification. Also, we develop a hardware accelerator on FPGA with Dynamic Resource Scheduling (DRS) based on TAS to enable better resource utilization and faster inference. Experimental results demonstrate the effectiveness of our method, which outperforms traditional sparse attention approaches with more than 2.1x decoding speedup and higher energy efficiency on the edge. Furthermore, by combining FastCar and sparse attention, FastCar can boost the performance of sparse attention with alleviated drifting, demonstrating our unique advantages for high-resolution and long-duration video generation. Code: https://github.com/shawnricecake/fast-car
- North America > United States (0.14)
- Asia > China > Jiangsu Province > Nanjing (0.04)
Numerical Pruning for Efficient Autoregressive Models
Shen, Xuan, Song, Zhao, Zhou, Yufa, Chen, Bo, Liu, Jing, Zhang, Ruiyi, Rossi, Ryan A., Tan, Hao, Yu, Tong, Chen, Xiang, Zhou, Yufan, Sun, Tong, Zhao, Pu, Wang, Yanzhi, Gu, Jiuxiang
Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing. However, their impressive performance often incurs high computational costs due to their substantial model size. This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning to improve the model efficiency while preserving performance for both language and image generation tasks. Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and MLP modules, respectively. Besides, we further propose another compensation algorithm to recover the pruned model for better performance. To verify the effectiveness of our method, we provide both theoretical support and extensive experiments. Our experiments show that our method achieves state-of-the-art performance with reduced memory usage and faster generation speeds on GPUs.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Tennessee (0.04)
- North America > United States > Pennsylvania (0.04)
- (2 more...)